It's clear to most intellectual property (IP) creators and users today that functional verification is one of the biggest problems facing the industry. As design complexity increases, the verification effort often over shadows the design effort. For complex IP, such as ARM Ltd. (Cambridge, U.K.) CPU cores, it becomes harder to detect obscure design errors and still provide the highest possible verification coverage. Most will agree that there is no single methodology or tool that solves what many are dubbing the "verification crisis." Thus an approach of multiple methodologies is required to tackle the problem from as many angles as possible. This is the approach we are adopting at ARM within the CPU Validation Group, a mix of deterministic simulation, random stimulus generation, and automatic testbench generation with Verisity's (Mountain View, CA) Specman Elite.
Deterministic simulation
The mainstay methodology that we have used since the early days of the first ARM CPU design is deterministic simulation. This is a common and well understood methodology that offers a number of advantages, although it's limited by the amount of effort required to generate test cases and the performance of simulation tools. At ARM, we develop test cases as self-checking assembler sequences. We then replay these code sequences on a simple simulation testbench consisting of the ARM CPU, a simple memory model, and some simple memory-mapped peripherals. Our tests fall into two categories, AVS (Architecture Validation Suites) and DVS (Device Validation Suites). All our AVS class tests check architectural functionality such as the instruction set architecture (32-bit and 16-bit Thumb), the exception model, and the debug architecture. Our DVS tests focus on the behavior of specific cores and check corner cases arising from the particular implementation. An advantage of this type of testcase is that tests are self-contained and portable from ISS (Instruction Set Simulator) environments to Verilog or VHDL testbench environments, or to FPGA prototypes and eventually to silicon. Thus, our customers and we can verify the functional equivalence of all these design views. These suites of tests are effectively the ARM architecture compliance suites.
We believe that the coverage attained by this deterministic approach is very high, but not complete. Thus we need to apply other methodologies to fill the coverage holes. Often, running real application code examples or booting an OS on the simulation model will flush out cases missed by the above test generation approaches. But, this approach is limited by simulation speed since the number of cycles required to run less than a second of real code is vast, therefore making simulation impractical for day-to-day regression tests. Additionally, finding and debugging failures becomes more difficult with this type of testcase. That said, booting a real RTOS such as Windows CE on the simulation model is a great demonstration that the CPU core is actually working.
We used code coverage tools such as Verisity's Sure Cov to measure the effectiveness of our verification code. Analyzing the results enable us to identify areas of the Verilog code that are either redundant or not covered by our verification suites. A word of warning though, 100 percent code coverage doesn't mean that your verification is complete. For this, you have to have a deeper understanding of what the functional coverage of your design is.
Other non-simulation techniques such as formal verification and FPGA prototyping help to complete the picture. Hopefully, these methodologies add useful coverage of the CPU device (see Figure 1). Of course there are other methodologies that are widely used such as hardware emulation and formal simulation hybrid tools, but these aren't the topic of this discussion today.
Random stimulus generation
Random stimulus generation is widely recognized as an effective approach for verifying corner cases that are hard to anticipate. We found that, while most of design bugs are flushed out by the deterministic approach, random instruction sequences are also highly effective in hitting obscure cases, often finding bugs that may lay undetected for years in real-life applications. We have an internal tool that can generate targeted random code sequences known as RIS. With RIS, we pre-generate self-checking tests using an ISS as the reference design. This technique won't catch design errors that are present in both the ISS and the HDL model, but in practice this situation is rare and these sequences are likely to show design errors in either model when enough sequences have been simulated.
As effective as instruction sequences are, running them on the simulation model can't find all types of design bugs, and therefore will miss several classes of problems. For example, as with many current ARM CPU devices, the core we were verifying-the ARM946E-S-supports the Advanced Microprocessor Bus Architecture (AMBA) on-chip bus interface. What we found was that it's quite possible to design a testbench based around an interface standard such as AMBA that allows the AVS and DVS tests to pass, even though the bus protocol is violated. Therefore, we needed a different testbench approach to find this type of problem. We incorporated automated testbenches into our environment using Verisity's Specman Elite tool to architect a sophisticated testbench for the verification of the ARM946E-S CPU core model. Using the e language, we have developed an automated self-checking testbench that stresses the ARM946E-S Verilog behavioral model to efficiently expose corner-cases that are difficult to observe by running code sequences on the model.
The ARM946E-S is a core based on the ARM9E-S CPU core. For the purposes of verifying the ARM946E-S itself, we considered the ARM9E-S core as third-party IP that was previously verified by ARM. Our objective was to efficiently verify the memory sub-system and control logic of the ARM946E-S. For this, running code sequences wasn't the most efficient approach. Thus we developed an e testbench that uses a bus functional model (BFM) of our ARM9E-S core to drive pseudo-random transactions onto the ARM9E-S instruction and data interfaces. Behavioral e models of the ARM946E-S sub-systems, such as the caches and system controller, are used to predict the effect of these randomly generated transactions, and a system of checkers ensures correct behavior of the Verilog RTL code, thus making a fully self-checking testbench. By altering the constraints on our BFM transaction generator, we targeted the generation to cover particular areas of the design and were able to soak test the model by using different random seeds. The rest of this article discusses the architecture of our e testbench in more detail.
Architecture overview
Before proceeding to describe the details of the e testbench, let's set the scene for the design that we are verifying, the ARM946E-S (see Figure 2).
The ARM946E-S is a synthesizable macrocell combining an ARM9E-S processor core with instruction and data caches, tightly-coupled instruction and data SRAM memory with protection units, write buffer, and an AMBA AHB (Advanced High-performance Bus) interface. The cache architecture is 4-way set associative with data cache support for write-back and write-through policies, and lock down on a per set basis for both instruction and data caches. The tightly coupled memories are programmable in size and offset, though the ISRAM is fixed to address 0x00000000. The protection unit provides eight programmable overlapping regions that allow the programmer to define pre-region cache and write buffer properties and access permissions. Finally, a write buffer is used to optionally buffer writes to external memory to reduce stalls due to external memory latency. The ARM9E-S core is a 32-bit RISC processor based on the ARM9TDMI core. It includes new signal processing extensions to the ARM instruction set and a single-cycle 16x32 multiply-accumulate (MAC) unit.
The ARM946E-S memory system architecture left us with many memory boundary and corner-cases to consider from a verification point of view. For example, there is nothing to prevent the core user from placing both code and data into the ISRAM address space; thus the ARM946E-S must provide a path for data access to the ISRAM. This generates some interesting cases for a Harvard architecture when the ARM9E-S core requests a data access and an instruction fetch in the same cycle. Also sequential transfers may cross any of the boundaries that exist between the ISRAM, IDRAM and external memory. Thus, we have to consider the cases where external accesses provoke cache linefills or cache line cast-outs when the cache is in write-back mode.
All of these behaviors combine to create a large state-space that must be verified. For the ARM946E-S we found that a combined approach of the AVS, DVS and RIS code sequences, coupled with a sophisticated pseudo-random e testbench provided an excellent overall verification solution.
An automatic testbench
The automated testbench for the ARM946E-S compares the activities occurring in the ARM946E-S Verilog model with the activities predicted by an e reference model of the ARM946E-S (the system behavioral model).
The testbench is composed of a series of e modules that drive and monitor the Verilog model of the ARM946E-S (the design-under-test). These modules can be classified as generators, reference models, checkers, and coverage modules. The generators generate stimuli, which are applied to both the Verilog model and the reference model, then a system of checkers is used to compare the behaviors. Finally coverage modules tell us how well we are doing in both the generation and in the coverage of the Verilog design-under-test (see Figure 3).
The test-bench contains two generators, an ARM9E-S BFM (Bus Functional Model) and an AHB Memory Model. The BFM replaces a ARM9E-S core in the testbench and generates pseudo-random instructions and data activities. The AHB Memory model replaces external memory and generates pseudo-random AHB responses to external memory requests.
In order to make the BFM reusable to the extent possible, we divided it internally into the instruction generator and the instruction activity driver. The instruction generator produced ARM9E-S instructions, which were then converted and played out by the instruction activity driver.
The instruction generator didn't generate complete information about any instruction; instead, it generated just the minimum amount of information needed to create ARM9E-S bus activities. The information generated by the instruction generator only depended on the ARM Instruction Set Architecture (ISA).
The instruction activity driver processed the instructions to create the ARM9E-S pipeline activities. The activity driver played these pipeline activities on the ARM9E-S bus. All the device architecture information, such as pipeline stages and timing properties, is maintained only within the instruction activity driver, and then was used as a basis to convert the instructions into pipeline activities.
Extending the BFM
Partitioning the BFM into the instruction generator and activity driver made it very easy to extend the BFM. Writing a new instruction activity driver made it very easy to create a new BFM for a new device with same ISA. Although, adding new instructions required making changes to both the instruction generator and the instruction activity driver.
The ARM9E-S BFM was also designed so that it can be replaced with an equivalent ARM9E-S core to run the AVS and DVS test suites on the testbench.
In order to have this capability, we incorporated core activity snoopers in testbench. These activity snoopers look at the hardware signals at the CPU core interface and determine the activities taking place on CPU interface. The activity snoopers then passed these activities to the reference model, which used the information to predict the result of CPU activities.
The stimulus generated by BFM is controlled by constraints, which are contained in a constraint file.
This constraint file is read by BFM at startup. The constraints determine:
1. The mix of instructions generated by the instruction generator
2. Addresses to which instruction fetches are issued by the BFM
3. Addresses to which memory operations are issued by the BFM
4. Various properties like the size and nature (read or write) of memory transactions
5. Processor modes in which the BFM generates
activities
6. Initial activities generated by the BFM after RESET
The BFM made extensive use of the constraint solving and generation engine in Specman Elite. The BFM also made extensive use of object-oriented features like inheritance, which is used to define every instruction. All generated instructions belong to same base class, and are supported by the e language.
Models applied
We replaced the external memory in the testbench with the AHB Memory. The external memory is present on the AHB bus as an AHB slave. The AHB standard requires that every AHB slave has to respond to an AHB transaction with an AHB response. The response can be OKAY, RETRY, SPLIT or ERROR. Additionally, the AHB slave can introduce wait states if it's unable to service an AHB transaction in one cycle.
The AHB memory model in the testbench produces pseudo random responses to AHB transactions. It also introduces random wait states while responding to AHB transactions. The responses are generated based on constraints read from a constraint file at startup.
The AHB memory also incorporated activity snoopers, which passed on activities occurring at the AHB memory to the scoreboard. The scoreboard is a module in the testbench where the validity of activities in the Verilog model is checked.
We used the reference model to predict the result of any stimuli generated by the BFM and the AHB memory model (the generators). The reference model is a fully functional behavioral model of ARM946E-S implemented in e. The model was implemented at a higher level of abstraction than the Verilog model and was not cycle accurate.
Internally, the reference model contained fully functional models of the ARM946E-S components such as instruction and data caches, instruction and data tightly coupled memories, protection unit, write buffer, system controller, and BIU (Bus Interface Unit).
As we already mentioned, the reference model is used to predict the result of stimuli generated by the generators. It uses a set of activity snoopers to determine the stimuli generated by the BFM and AHB models. The reference model processes the snooped stimuli and predicts activities expected in the Verilog model. It then adds the predicted activities to the scoreboard in the testbench. It's in scoreboard that the predicted activity is checked against the actual activity in Verilog model.
To increase code reusability, we made all the models present within the reference model timing unaware, except one. To increase extensibility of the code, we used the inheritance concept supported in e extensively. For every model, we defined a generic base version, so models with properties unique to ARM946E-S would be inherited from the base versions.
As we mentioned, the scoreboard is the module in testbench where the validity of any activity in Verilog model is checked. Activities predicted by the reference model are added to a list of expected transactions in the scoreboard. Using a set of activity snoopers, the scoreboard snoops the actual activities occurring in the HDL model and then checks the actual activities against the list of expected activities. All expected and actual activities not matched in scoreboard during the simulation are printed out at the end of simulation.
Within the scoreboard, we made extensive use of the list data structure provided in e. We found the list search methods very useful for implementation of the scoreboard.
We have made extensive use of temporal expressions supported in e via the AHB protocol checker. The AHB protocol checker checks that an AHB Master (ARM946E-S) conforms to the protocol as specified in the AMBA-AHB bus standard. The protocol checker flags an error every time an AHB master causes a transaction, which doesn't comply, to the AHB standard.
We made the AHB protocol checker a totally independent module in the testbench. This meant that it didn't communicate with any other module in testbench, and it can be used to check protocol violations in a deterministic testbench.
In the present form of our testbench, the support for collecting coverage is very basic. We are collecting coverage on the instructions being generated by the BFM. For future projects, we plan to use functional coverage to collect coverage information about operations on caches, tightly coupled memories, the BIU and the AHB memory. We think that by extensively using coverage, we can determine holes in our simulations, and subsequently plug the holes. This in-turn would help us develop a more complete set of test case.
Full random testing When running the testbench, we used two different configurations. The first configuration contains the BFM, the AHB Memory model, the AHB protocol checker, the reference model, and the scoreboard. This configuration was used for full random testing. For the second configuration, we replaced the BFM with the ARM9E-S core. This configuration was used to run the AVS and DVS test suites.
In both configurations, the testbench reads the generator constraint files, configuration files, and a binary image to load into the AHB memory at startup.
At the end of simulation, the testbench writes out a simulation report, which contains information about all the protocol mismatches identified by the AHB protocol checker and the activity mismatches identified by the scoreboard.
We found the random testbench to be very useful in finding some of the obscure bugs in our designs. Some of the bugs we found were very difficult to find with assembly test cases. For example, it's very difficult to check for repeated write to memory, when the data being written to the memory is correct every time it's written.
Deterministic verses randomThe deterministic approach of the AVS and DVS tests mentioned at the beginning of this article remain the primary verification methodology for ARM CPU cores. Secondary to this is our RIS methodology for random code sequences. However the automatic testbench outlined above has many advantages that make it possible to detect bugs that would have been missed previously. Developing suites of deterministic tests takes a lot of time and resources. A random methodology allows us to generate useful test cases much earlier in the verification flow, thus leading to stable designs more quickly. Once the deterministic suites are complete, the random approaches offer useful top-up coverage for those obscure corner cases missed by the test suites. The e language facilitates the use of a software engineering approach to the creation of sophisticated and efficient testbenches. With Specman Elite there is a new programming language to learn, and new approaches to adopt. However, the effort is well justified if it improves the quality of the final result.
Nitin Sharma is an engineer at ARM Ltd. (Cambridge, U.K.).
Bryan Dickman is the CPU design verification manager for ARM Ltd.